
*===============================================================================
* Replication Do-file
* Project: The Impact of the Tigray War on Child Education and Labor in Ethiopia
* Data Sources: Ethiopian Socioeconomic Survey (ESS) and ACLED Conflict Data
* Authors     : Dainn Wie, Demeke Yemareshet
* Affiliation : National Graduate Institute for Policy Studies, Tokyo, Japan
* Contact     : wie-dainn@grips.ac.jp, hailuyemar@gmail.com
* Description: Cleaning and analysis of ESS and ACLED data for main results
*===============================================================================

*===============================================================================
* Programs to be installed
* ssc install geodist
* ssc install reghdfe
* ssc install ftools
*===============================================================================


*---------------------------
* 0. Set-up Environment
*---------------------------

* Set working directory
* correct line depending on who is running the file


*cd "C:\Users\wie-dainn\Dropbox\Yema\Analysis.CH2\Data"   // Dainn's path
cd "C:\Users\hailu\Dropbox\Yema\Analysis.CH2\Data"           // Yema's path


* Turn off pauses and close any open logs
set more off
cap log close


*---------------------------
* Load and Clean ESS 2018 Data
*---------------------------

use "2018_ESS/sect1_hh_w4.dta", clear   // Load household-level data

* Check for duplicate observations at the individual level
duplicates report household_id individual_id



*---------------------------
* Merge ESS 2018 Sections 1, 2, and 4
*---------------------------


* Merge SECTION 1 (Household Roster) with SECTION 2 (Education)
merge 1:1 household_id individual_id using "2018_ESS/sect2_hh_w4.dta"

* Keep relevant variables for analysis
keep household_id individual_id ea_id saq14 pw_w4 saq01 saq02 saq03 saq04 saq05 ///
     saq06 saq07 saq08 s1q00 s1q01 s1q02 s1q03a s1q03b s1q09 s1q12 s1q16 s1q20 ///
     s2q04 s2q05 s2q05_os s2q06 s2q07 s2q08 s2q08_os s2q09 s2q10 s2q10_os ///
     s2q11 s2q12 s2q12_os s2q13 s2q13_os s2q14 s2q15 s2q16a s2q16b s2q17 s2q18 ///
     s2q19 pw_w4 s1q05 s1q06

* Merge with SECTION 4 (e.g., Employment/Activities)
merge 1:1 household_id individual_id using "2018_ESS/sect4_hh_w4.dta"

* Rename variables for clarity
rename ea_id                        enumeration_areaid
rename saq14                        rural_urban
rename saq01                        region
rename saq02                        zone
rename saq03                        district
rename saq06                        kebele
rename saq07                        enumeration_code
rename s1q01                        relation_head
rename s1q02                        sex
rename s1q12                        birth_region
rename pw_w4                        hh_weight2018
rename s1q03a                       age_years2018
rename s1q03b                       age_months2018
rename s1q05                        still_member2018
rename s1q06                        weeks_away2018
rename s1q09                        marital_status2018
rename s1q16                        father_highest_educ2018
rename s1q20                        mother_highest_educ2018
rename s2q04                        ever_attended_school2018
rename s2q05                        reason_neverattend_school2018
rename s2q05_os                     other_neveratt_sch2018
rename s2q06                        highest_educ_completed2018
rename s2q07                        currently_attending_school2018
rename s2q08                        why_current_notschool2018
rename s2q08_os                     otherreas_current_not2018
rename s2q09                        level_enrolled2018
rename s2q11                        absent_school2018
rename s2q12                        reason_absent2018
rename s2q14                        minutes_school2018
rename s2q19                        plan_school_nextyear2018


* Drop merge variable created by second merge
drop _merge

* Save cleaned and merged dataset
save "2018_ESS_SEC1_4.dta", replace



*---------------------------
* Merge with Section 9: Household Shocks
*---------------------------

* Load shock data (Section 9)
use "2018_ESS/sect9_hh_w4.dta", clear

* Keep only relevant variables
keep household_id shock_type s9q01
rename s9q01 s9

* Reshape shock data from long to wide format
reshape wide s9, i(household_id) j(shock_type)

* Save reshaped shock data
save "2018_sec_9.dta", replace


*---------------------------
* Merge with Cleaned Section 1–4 Dataset
*---------------------------

use "2018_ESS_SEC1_4.dta", clear

* Merge shock data by household ID
merge m:1 household_id using "2018_sec_9.dta"
drop _merge


*---------------------------
* Merge with Household Geovariables (GPS) Data
*---------------------------

merge m:1 household_id using "2018_ESS/ETH_HouseholdGeovariables_Y4.dta"
drop _merge

* Drop unneeded geographic and environmental variables
drop ea_id ssa_aez09 twi sq1 sq2 sq3 sq4 sq5 sq6 sq7 af_bio_1 slopepct ///
     srtm1k popdensity h2018_wetQstart h2019_wetQstart wetQ_avgstart ///
     wetQ_avg h2018_ndvi_avg h2018_ndvi_max h2019_ndvi_avg h2019_ndvi_max ///
     ndvi_avg ndvi_max saq04 saq05 saq08 s1q00 s2q10 s2q10_os s2q12_os ///
     s2q13 s2q13_os s2q15 s2q16a s2q16b s2q17 s2q18 dist_road dist_market ///
     dist_border dist_popcenter dist_admhq af_bio_8 af_bio_12 af_bio_13 ///
     af_bio_16 cropshare h2018_tot h2018_wetQ h2019_tot h2019_wetQ anntot_avg

	 
* Rename location variables
rename lat_mod latitude_modified
rename lon_mod longitude_modified

* Drop observations in Region 1 (Tigray)
drop if region == 1


*---------------------------
* Rename Variables for Year Tagging
*---------------------------

* Drop unused variables
drop s4q00 s4q01 s4q02 s4q03a s4q03b s4q04a s4q04b s4q07 s4q16 s4q17 ///
     s4q18 s4q20 s4q21 s4q22 s4q24 s4q25_1 s4q25_2 s4q25_os s4q29 s4q30 ///
     s4q31 s4q31_os

* Rename variables to include year suffix (2018)
rename s4q05         s4q052018
rename s4q06         s4q062018
rename s4q10         s4q102018
rename s4q11         s4q112018
rename s4q12         s4q122018
rename s4q13         s4q132018
rename s4q14         s4q142018
rename s4q15         s4q152018
rename s4q18_os      s4q18_os2018
rename s4q19         s4q192018
rename s4q23         s4q232018
rename s4q26         s4q262018
rename s4q27         s4q272018
rename s4q28         s4q282018
rename s4q28_os      s4q28_os2018
rename s4q32         s4q322018
rename s4q33         s4q332018
rename s4q33b        s4q33b2018
rename s4q34a        s4q34a2018
rename s4q34b        s4q34b2018
rename s4q34c        s4q34c2018
rename s4q34d        s4q34d2018
rename s4q34         s4q342018
rename s4q34_os      s4q34_os2018
rename s4q35         s4q352018
rename s4q36         s4q362018
rename s4q37         s4q372018
rename s4q38         s4q382018
rename s4q39         s4q392018
rename s4q40         s4q402018
rename s4q41         s4q412018
rename s4q42         s4q422018
rename s4q42_os      s4q42_os2018
rename s4q43         s4q432018
rename s4q44         s4q442018
rename s4q45         s4q452018
rename s4q46         s4q462018
rename s4q47         s4q472018
rename s4q48         s4q482018
rename s4q49         s4q492018
rename s4q50         s4q502018
rename s4q51         s4q512018
rename s4q52         s4q522018
rename s4q53         s4q532018
rename s4q54         s4q542018

*---------------------------
* Save Final Cleaned ESS 2018 Dataset
*---------------------------

save "ESS_2018_hh.dta", replace



*---------------------------
* Merge with Community Survey Data
*---------------------------

* Load community-level survey data
use "2018_ESS/sect04_com_w4.dta", clear

* Keep only relevant variables
keep ea_id saq01 cs4q37 cs4q18 cs4q21

* Rename for clarity and consistency
rename ea_id  enumeration_areaid
rename saq01  region
rename cs4q37 hospital2018
rename cs4q18 primary2018
rename cs4q21 secondary2018

* Merge with household-level data
merge 1:m region enumeration_areaid using "ESS_2018_hh.dta"

* Rename selected shock variables for clarity
rename s91  death_hh_member2018  
rename s96  drought2018
rename s910 crop_damage2018  
rename s914 death_livestock2018 
rename s916 theft2018
rename s93  death_member2018
rename s95  loss_nonfarm_jobs2018
rename s912 unusual_increpri_food2018
rename s913 unusual_increpri_inputs2018

* Drop unused shock variables
drop s92 s94 s97 s98 s99 s911 s915 s917 s918 s919 s920

* Save final dataset after merging with community survey
save "ESS_2018_hh.dta", replace




*******************************************
* ACLED Conflict Data Cleaning (2020–2022)
*******************************************

clear all

* Import ACLED Excel file (adapt path as needed)
* import excel "C:\Users\wie-dainn\Desktop\Dropbox\Yema\ZZ_Second Idea\Data\ACLED\Africa_1997-2023_Jun16.xlsx", ///
*     sheet("Sheet1") firstrow
import excel "ACLED\Africa_1997-2023_Jun16.xlsx", sheet("Sheet1") firstrow clear


* Keep only Ethiopia and conflict events from Nov 2020 to Dec 2022

keep if COUNTRY == "Ethiopia"
keep if EVENT_DATE >= td(01nov2020) & EVENT_DATE <= td(31dec2022)

* Keep only regions affected by the Tigray war
drop REGION
rename ADMIN1 region
keep if inlist(region, "Tigray", "Amhara", "Afar")


* Label relevant variables
label var region      "Administrative Region"
label var ADMIN2      "Administrative Zone"
label var ADMIN3      "Administrative District"

* Collapse by location (summing fatalities and counting incidents)
collapse (sum) total_fatalities = FATALITIES (count) total_incidents = FATALITIES, ///
    by(LATITUDE LONGITUDE region ADMIN3 LOCATION)

*---------------------------
* Generate Unique ID for Locations
*---------------------------

gen unique_id = LOCATION + "_" + string(LATITUDE, "%10.6f") + "_" + string(LONGITUDE, "%10.6f")
egen id = group(unique_id)

* Keep only relevant variables and rename for clarity
keep id region ADMIN3 LOCATION LATITUDE LONGITUDE total_fatalities total_incidents
rename ADMIN3 district
gen n = 1   // Temporary ID for reshaping


* A: Appendix Tables
* Table A1: Summary Statistics of Conflicts from ACLED Data

summarize total_fatalities total_incidents
asdoc summarize total_fatalities total_incidents, ///
    label replace save(table_A1.doc)

*---------------------------
* Reshape to Wide Format by Conflict Location ID
*---------------------------

reshape wide LATITUDE LONGITUDE region district LOCATION total_fatalities total_incidents, ///
    i(n) j(id)

* Save cleaned ACLED dataset
save "ACLED/cleaned.dta", replace



***** Merging ESS 2018 Household Data with ACLED Conflict Data

* Load Ethiopian Socioeconomic Survey *
use "ESS_2018_hh.dta", clear

gen n = 1   // Dummy key for merging with conflict data

drop _merge

* Merge with Cleaned ACLED Conflict Data *
merge m:1 n using "ACLED/cleaned.dta"

drop n



*  Calculate geodesic distance to each conflict site *

* Note: `geodist` must be installed. Run once:
* ssc install geodist

* Loop over all conflict sites (assuming 420 in total)
forval i = 1/420 {
    geodist latitude_modified longitude_modified LATITUDE`i' LONGITUDE`i', ///
        gen(distance`i')
}

* Add survey year as a variable
gen year_2018 = 2018


*  Save merged dataset for analysis   *
save "ESS_2018_hh_cleaned.dta", replace





*--------------------------------------------------*
* Load and Clean ESS 2021/22 Survey Data
*--------------------------------------------------*

use "2021_ESPS/sect1_hh_w5.dta", clear

* Check for duplicate household and individual IDs
duplicates report household_id individual_id

* Generate unique ID combining household and individual IDs
gen unique_id = household_id + string(individual_id, "%2.0f")

* Merge SECTION 1 (Household Roster) with SECTION 2 (Education)
merge 1:1 household_id individual_id using "2021_ESPS/sect2_hh_w5.dta"

* Keep relevant variables for analysis
keep household_id individual_id ea_id saq14 saq01 saq02 saq03 saq04 saq05 saq06 saq07 saq08 ///
     s1q00 s1q01 s1q02 s1q03a s1q03b s1q09 s1q12 s1q16 s1q20 s2q04 s2q05 s2q05_os ///
     s2q06 s2q07 s2q08 s2q08_os s2q09 s2q10 s2q10_os s2q11 s2q12 s2q12_os s2q13 ///
     s2q13_os s2q14 s2q15 s2q16a s2q16b s2q17 s2q18 s2q19 pw_w5 s1q05 s1q06


* Merge SECTION 1 & 2 with SECTION 4

merge 1:1 household_id individual_id using "2021_ESPS/sect4_hh_w5.dta"

* Rename variables for clarity

rename ea_id             enumeration_areaid
rename saq14             rural_urban
rename saq01             region
rename saq02             zone
rename saq03             district
rename saq06             kebele
rename saq07             enumeration_code
rename saq08             householdid
rename s1q01             relation_head
rename s1q02             sex
rename s1q12             birth_region

rename pw_w5                hh_weight2021
rename s1q03a               age_years2021
rename s1q03b               age_months2021
rename s1q09                marital_status2021
rename s1q16                father_highest_educ2021
rename s1q20                mother_highest_educ2021
rename s2q04                ever_attended_school2021
rename s1q05                still_member2021
rename s1q06                weeks_away2021
rename s2q05                reason_neverattend_school2021
rename s2q05_os             other_neveratt_sch2021
rename s2q06                highest_educ_completed2021
rename s2q07                currently_attending_school2021
rename s2q08                why_current_notschool2021
rename s2q08_os             otherreas_current_not2021
rename s2q09                level_enrolled2021
rename s2q11                absent_school2021
rename s2q12                reason_absent2021
rename s2q14                minutes_school2021
rename s2q19                plan_school_nextyear2021


* Save cleaned and merged data

save "2021_ESS_SEC1_4", replace


* Merge Section 9 (Shocks) with Main Individual Dataset

* Load section 9 (shocks) data
use "2021_ESPS\sect9_hh_w5.dta", clear

* Keep relevant variables only
keep household_id shock_type s9q01
rename s9q01 s9

* Reshape shock data from long to wide format
reshape wide s9, i(household_id) j(shock_type)

* Save the reshaped section 9 data
save "2021_sec_9.dta", replace


* Load main individual-level dataset and merge

use "2021_ESS_SEC1_4.dta", clear

* Drop old _merge variable if exists (in case of previous merge)
capture drop _merge

* Merge with reshaped section 9 data using household_id
merge m:1 household_id using "2021_sec_9.dta"

* (Optional) Inspect merge result
tab _merge
drop if _merge == 2   // drop unmatched observations from section 9
drop _merge


* Rename shock variables for clarity

rename s91   death_hh_member2021
rename s93   death_member2021
rename s95   loss_nonfarm_jobs2021
rename s96   drought2021
rename s910  crop_damage2021
rename s912  unusual_increase_foodprice2021
rename s913  unusual_increase_inputprice2021
rename s914  death_livestock2021
rename s916  theft2021


* Drop unused shock variables

drop s92 s94 s97 s98 s99 s911 s915 s917 s918 s919


* Merge with Household Geovariables data/GPS
merge m:1 household_id using "2021_ESPS\eth_householdgeovariables_y5.dta"

* Inspect merge result
tab _merge

drop _merge

* Drop unnecessary variables
drop ssa_aez09 twi sq1 sq2 sq3 sq4 sq5 sq6 sq7 afmnslp_pct srtm_1k popdensity ///
     h2021_wetQstart h2021_wetQ wetQ_avgstart wetQ_avg c2_h2021_sen c2_h2021_grn ///
     c2_h2021_evimax c2_h2021_eviarea c2_sen_avg c2_grn_avg c2_evimax_avg c2_eviarea_avg ///
     h2021_sen h2021_grn h2021_evimax h2021_eviarea sen_avg grn_avg evimax_avg ///
     saq04 saq05 s1q00 s2q10 s2q10_os s2q12_os s2q13 s2q13_os s2q15 s2q16a s2q16b s2q17 s2q18 ///
     dist_road dist_market dist_border dist_popcenter dist_admhq ///
     af_bio_1_x af_bio_8_x af_bio_12_x af_bio_13_x af_bio_16_x ///
     cropshare landcov pct_urban_cluster pct_urban_center h2021_tot anntot_avg eviarea_avg suppress

* Rename GPS coordinates for clarity
rename lat_dd_mod latitude_modified
rename lon_dd_mod longitude_modified



* Drop unnecessary variables
drop s4q00 s4q01 s4q02 s4q03a s4q03b s4q04a s4q04b s4q07 s4q16 s4q17 s4q18 ///
     s4q20 s4q21 s4q22 s4q24 s4q25_1 s4q25_2 s4q25_os s4q29 s4q30 s4q31 ///
     s4q31_os

* Rename variables with 2021 suffix for clarity
foreach var in 05 06 10 11 12 13 14 15 18_os 19 23 26 27 28 28_os ///
               32 33 33b 34a 34b 34c 34d 34 34_os 35 36 37 38 39 40 ///
               41 42 42_os 43 44 45 46 47 48 49 50 51 52 53 54 {
    rename s4q`var' s4q`var'2021
}


save "ESS_2021_hh", replace


* Load and keep relevant community-level variables

use "2021_ESPS\sect04_com_w5.dta", clear
keep ea_id saq01 cs4q37 cs4q18 cs4q21

* Rename for clarity
rename ea_id enumeration_areaid
rename saq01 region
rename cs4q37 hospital2021
rename cs4q18 primary2021
rename cs4q21 secondary2021

* Merge with household-level dataset
merge 1:m enumeration_areaid using "ESS_2021_hh"

* Check merge success
tab _merge

* Save merged file
save "ESS_2021_hh", replace



***********************  merge ESS 2021/2022 with ACLED (Conflict dataset) *

use "ESS_2021_hh", clear

gen n = 1

drop _merge
merge m:1 n using "ACLED\cleaned.dta"
drop n


* Calculate distance from households to each conflict site
forval i = 1/420 {
    geodist latitude_modified longitude_modified LATITUDE`i' LONGITUDE`i', gen(distance`i')
}

drop _merge

gen year_2021 = 2021

save "ESS_2021_hh_cleaned.dta", replace







*********** Merge 2018 and 2021 ESS Data ***********

use "ESS_2018_hh_cleaned.dta", clear

* Check duplicates for household_id and individual_id
duplicates list household_id individual_id

* Drop duplicates 
drop _merge
duplicates drop household_id individual_id, force


* Merge 2018 and 2021 data by household_id and individual_id (one-to-one)

merge 1:1 household_id individual_id using "ESS_2021_hh_cleaned.dta", generate(merge_flag)


rename unusual_increase_foodprice2021 unusual_increpri_food2021
rename unusual_increase_inputprice2021 unusual_increpri_inputs2021

* Reshape data from wide to long to combine 2018 and 2021 into one dataset
reshape long hh_weight age_years age_months still_member weeks_away marital_status ///
    father_highest_educ mother_highest_educ ever_attended_school reason_neverattend_school ///
    other_neveratt_sch highest_educ_completed currently_attending_school why_current_notschool ///
    otherreas_current_not level_enrolled absent_school reason_absent minutes_school ///
    plan_school_nextyear s4q05 s4q06 s4q10 s4q11 s4q12 s4q13 s4q14 s4q15 s4q18_os s4q19 ///
    s4q23 s4q26 s4q27 s4q28 s4q28_os s4q32 s4q33 s4q33b s4q34a s4q34b s4q34c s4q34d ///
    s4q34 s4q34_os s4q35 s4q36 s4q37 s4q38 s4q39 s4q40 s4q41 s4q42 s4q42_os s4q43 ///
    s4q44 s4q45 s4q46 s4q47 s4q48 s4q49 s4q50 s4q51 s4q52 s4q53 s4q54 hospital primary ///
    secondary death_hh_member drought crop_damage death_livestock theft death_member ///
    loss_nonfarm_jobs unusual_increpri_food unusual_increpri_inputs, ///
    i(household_id individual_id) j(Year)


* Save the merged and reshaped dataset
save "ESS_2018_2021_hh_acled.dta", replace



*** Load the merged and reshaped ESS dataset ***

use "ESS_2018_2021_hh_acled.dta", clear

* Create a unique person ID combining household and individual IDs
egen pid = group(household_id individual_id)

* Count number of observations per individual (should be 1 or 2 for 2018 & 2021)
bysort pid (Year): gen obs_count = _N

* Tabulate number of observations per individual
tabulate obs_count


* Create post-treatment indicator for year 2021
gen post = (Year == 2021)


* Create binary variable for attending school: 1 = Yes, 0 = No
gen attending_school = .
replace attending_school = 1 if currently_attending_school == 1
replace attending_school = 0 if currently_attending_school == 2

* Generate binary indicators for conflicts within 20 km
foreach i of numlist 1/420 {
    gen distance_binary_20`i' = (distance`i' <= 20)
}


* Generate binary indicators for conflicts within 50 km
foreach i of numlist 1/420 {
    gen distance_binary_50`i' = (distance`i' <= 50)
}

* Count number of conflicts within 20 km
gen count_distances_20 = 0
foreach i of numlist 1/420 {
    replace count_distances_20 = count_distances_20 + distance_binary_20`i'
}

* Count number of conflicts within 50 km
gen count_distances_50 = 0
foreach i of numlist 1/420 {
    replace count_distances_50 = count_distances_50 + distance_binary_50`i'
}


******** Total incidents for all conflict areas within 20km and 50km ********

* Initialize total incidents variables
gen Total_incidents_20 = 0
gen Total_incidents_50 = 0

* Sum total incidents for conflicts within 20km
forval i = 1/420 {
    replace Total_incidents_20 = Total_incidents_20 + total_incidents`i' if distance_binary_20`i' == 1
}

* Sum total incidents for conflicts within 50km
forval i = 1/420 {
    replace Total_incidents_50 = Total_incidents_50 + total_incidents`i' if distance_binary_50`i' == 1
}


******** Total fatalities for all conflict areas within 20km and 50km ********

* Initialize total fatalities variables

gen Total_fatalities_20 = 0
gen Total_fatalities_50 = 0

* Sum total fatalities for conflicts within 20km
forval i = 1/420 {
    replace Total_fatalities_20 = Total_fatalities_20 + total_fatalities`i' if distance_binary_20`i' == 1
}

* Sum total fatalities for conflicts within 50km
forval i = 1/420 {
    replace Total_fatalities_50 = Total_fatalities_50 + total_fatalities`i' if distance_binary_50`i' == 1
}



************


 
********** Drop intermediate distance and conflict variables **********

drop distance_binary*
drop total_fatalities*
drop total_incidents*
drop distance*


*** Create dummy variables for conflict exposure based on incidents and fatalities ***

* Exposure within 20km: exposed if there is at least 1 incident and more than 5 fatalities
gen exposed_dummy_20 = (Total_incidents_20 > 0) & (Total_fatalities_20 > 5)

* Exposure within 50km: same criteria
gen exposed_dummy_50 = (Total_incidents_50 > 0) & (Total_fatalities_50 > 5)

*** Create interaction terms with post-treatment indicator (year == 2021) ***

gen interaction_dummy_20 = exposed_dummy_20 * post
gen interaction_dummy_50 = exposed_dummy_50 * post

* Interaction terms with continuous conflict measure (total incidents) and post indicator
gen interaction_continuous_20 = Total_incidents_20 * post
gen interaction_continuous_50 = Total_incidents_50 * post


* Create dummy variables for conflict exposure based on fatalities

gen exposed_dumfatalit_20 = (Total_fatalities_20 > 0)
gen exposed_dumfatalit_50 = (Total_fatalities_50 > 0)

* Create interaction terms with post-treatment indicator (year == 2021)

gen interaction_dumfatalit_20 = exposed_dumfatalit_20 * post
gen interaction_dumfatalit_50 = exposed_dumfatalit_50 * post

* Interaction terms with continuous fatality counts and post indicator

gen interaction_numberfatal_20 = Total_fatalities_20 * post
gen interaction_numberfatal_50 = Total_fatalities_50 * post
 
 
 
 
***** Create a binary sex variable: male = 1 if sex == 1, else 0 *****

gen sex_new = 1 if sex == 1
replace sex_new = 0 if sex == 2

gen male = (sex_new == 1)

***** Interaction terms with male and conflict exposure *****

gen inter_dummy_20_sex = interaction_dummy_20 * male
gen inter_dummy_50_sex = interaction_dummy_50 * male

gen inter_cont_20_sex = interaction_continuous_20 * male
gen inter_cont_50_sex = interaction_continuous_50 * male

***** Rename variables for clarity *****

rename interaction_dummy_20 Exposure_20km_dummy
rename interaction_dummy_50 Exposure_50km_dummy
rename interaction_continuous_20 Exposure_20km_continuous
rename interaction_continuous_50 Exposure_50km_continuous

rename interaction_numberfatal_20 Exposure_20km_fatalities
rename interaction_numberfatal_50 Exposure_50km_fatalities

rename interaction_dumfatalit_20 Exposure_20km_fatalitdummy
rename interaction_dumfatalit_50 Exposure_50km_fatalitdummy

***** Label variables *****

label variable Exposure_20km_dummy "Conflict within 20km"
label variable Exposure_50km_dummy "Conflict within 50km"
label variable Exposure_20km_continuous "Number of conflicts within 20km"
label variable Exposure_50km_continuous "Number of conflicts within 50km"

label variable Exposure_20km_fatalities "Number of fatalities within 20km"
label variable Exposure_50km_fatalities "Number of fatalities within 50km"

***** Save final dataset *****
save "ESS_2018_2021_hh_acled_final.DTA", replace



* Create a variable for household size (number of individuals per household)
use "ESS_2018_2021_hh_acled_final.DTA", clear

sort household_id

gen household_size = .

bysort household_id (household_id): replace household_size = _N

* Keep only children aged 7 to 18 years (school-age children)
drop if age_years < 7 | age_years > 18


* Generate a dummy for school absence last semester (1 = absent more than a week, 0 otherwise)

gen absent_school_last = .
replace absent_school_last = 1 if absent_school == 1
replace absent_school_last = 0 if absent_school == 2


* Generate a dummy for planning to attend school next year (1 = yes, 0 = no)

gen plan_school = .
replace plan_school = 1 if plan_school_nextyear == 1
replace plan_school = 0 if plan_school_nextyear == 2


**** Include Father's Education: Create interaction variables (Conflict Exposure * Father's Education)

* Create a binary indicator for high SES based on father's highest education (values between 7 and 35)

gen father_high_SES = inrange(father_highest_educ, 7, 35)
label variable father_high_SES "High SES (Father's Education)"

* Interaction terms with conflict exposure dummies (20km and 50km)

gen interaction_father_20dum = Exposure_20km_dummy * father_high_SES
label variable interaction_father_20dum "Conflict within 20km * High SES (Father's Education)"

gen interaction_father_50dum = Exposure_50km_dummy * father_high_SES
label variable interaction_father_50dum "Conflict within 50km * High SES (Father's Education)"

* Interaction terms with continuous measures of conflict exposure

gen interaction_father_20cont = Exposure_20km_continuous * father_high_SES
label variable interaction_father_20cont "Number of conflicts within 20km * High SES (Father's Education)"

gen interaction_father_50cont = Exposure_50km_continuous * father_high_SES
label variable interaction_father_50cont "Number of conflicts within 50km * High SES (Father's Education)"



**** Include Mother's Education: Create interaction variables (Conflict Exposure * Mother's Education)

* Create a binary indicator for high SES based on mother's highest education (values between 7 and 35)

gen mother_high_SES = inrange(mother_highest_educ, 7, 35)
label variable mother_high_SES "High SES (Mother's Education)"

* Interaction terms with conflict exposure dummies (20km and 50km)

gen interaction_mother_20dum = Exposure_20km_dummy * mother_high_SES
label variable interaction_mother_20dum "Conflict within 20km * High SES (Mother's Education)"

gen interaction_mother_50dum = Exposure_50km_dummy * mother_high_SES
label variable interaction_mother_50dum "Conflict within 50km * High SES (Mother's Education)"

* Interaction terms with continuous measures of conflict exposure

gen interaction_mother_20cont = Exposure_20km_continuous * mother_high_SES
label variable interaction_mother_20cont "Number of conflicts within 20km * High SES (Mother's Education)"

gen interaction_mother_50cont = Exposure_50km_continuous * mother_high_SES
label variable interaction_mother_50cont "Number of conflicts within 50km * High SES (Mother's Education)"



****** Child labor variables ******


* Employment status in last 12 months

gen employed_12months = 1 if s4q33 == 1
replace employed_12months = 0 if s4q33 == 2
replace employed_12months = 0 if missing(employed_12months)

* Paid employment in last 12 months

gen employed_pay_12months = 1 if s4q33b == 1
replace employed_pay_12months = 0 if s4q33b == 2
replace employed_pay_12months = 0 if missing(employed_pay_12months)

* Employment compensated by cash or food

gen employed_cash_food = 1 if s4q45 == 1
replace employed_cash_food = 0 if s4q45 == 2

* Casual labor work indicator

gen casual_labour_work = 1 if s4q48 == 1
replace casual_labour_work = 0 if s4q48 == 2

* Worked for free for others

gen worked_free_others = 1 if s4q51 == 1
replace worked_free_others = 0 if s4q51 == 2

* Free labor for public or community
gen free_labor_public = 1 if s4q53 == 1
replace free_labor_public = 0 if s4q53 == 2


****** Rename distance variables ******

rename hospital distance_hospital
rename primary distance_primary
rename secondary distance_secondary


****** Weekly Work and Job Characteristics ******

* Worked in agriculture during the last week

gen work_week_agr = 1 if s4q05 == 1
replace work_week_agr = 0 if s4q05 == 2

* Worked in non-agriculture during the last week

gen work_week_nonagr = 1 if s4q08 == 1
replace work_week_nonagr = 0 if s4q08 == 2

* Part-time job indicator

gen part_time = 1 if s4q10 == 1
replace part_time = 0 if s4q10 == 2

* Paid salary work indicator

gen salary_work = 1 if s4q12 == 1
replace salary_work = 0 if s4q12 == 2

* Unpaid apprenticeship indicator

gen unpaid_apprenticeship = 1 if s4q14 == 1
replace unpaid_apprenticeship = 0 if s4q14 == 2



* Dropped unnecessary variables:

drop s4q05 s4q06 s4q10 s4q11 s4q12 s4q13 s4q14 s4q15 s4q18_os s4q19 s4q23 s4q26 s4q27 s4q28 s4q28_os s4q32 s4q33 s4q33b s4q34a s4q34b s4q34c s4q34d s4q34 s4q34_os s4q35 s4q36 s4q37 s4q38 s4q39 s4q40 s4q41 s4q42 s4q42_os s4q43 s4q44 s4q45 s4q46 s4q47 s4q48 s4q49 s4q50 s4q51 s4q52 s4q53 s4q54

drop region1-region420
drop LOCATION*
drop LATITUDE*
drop LONGITUDE*


****** Save Final Dataset ******

save "ESS_2018_2021_hh_acled_final.DTA", replace





************







































********************************************
* Placebo Test: Using 2013 and 2015 ESS Data
********************************************

************************
* Cleaning 2013 ESS Data
************************

use "ETH_2013_ESS_v03_M_STATA\sect1_hh_w2.dta", clear

* Check for duplicate entries based on household and individual IDs
duplicates report household_id2 individual_id2


* merge SECTION 1: HOUSEHOLD ROSTER with SECTION 2 -EDUCATION
merge 1:1 household_id2 individual_id2 using "ETH_2013_ESS_v03_M_STATA\sect2_hh_w2.dta"

* Drop unnecessary variables
drop hh_s1q04d hh_s1q04e hh_s1q04f hh_s1q04g_1 hh_s1q04g_2 hh_s1q04g_3 hh_s1q04h hh_s1q06 


* Rename variables for consistency
rename ea_id enumeration_areaid
rename ea_id2 enumeration_areaid2
rename saq01 region
rename saq02 zone
rename saq03 district
rename saq06 kebele
rename saq07 enumeration_code
rename hh_s1q02 relation_head
rename hh_s1q03 sex
rename hh_s1q04_a age_years2013
rename hh_s1q08 marital_status2013
rename hh_s1q15 father_highest_educ2013
rename hh_s1q19 mother_highest_educ2013
rename hh_s2q03 ever_attended_school2013
rename hh_s2q06 currently_attending_school2013
rename pw2 hh_weight2013
rename hh_s2q18 plan_school_nextyear2013

* Keep relevant variables only
keep household_id household_id2 rural hh_weight2013 individual_id individual_id2 ///
     enumeration_areaid enumeration_areaid2 region zone district kebele enumeration_code ///
     relation_head sex age_years2013 marital_status2013 father_highest_educ2013 ///
     mother_highest_educ2013 ever_attended_school2013 currently_attending_school2013 ///
     plan_school_nextyear2013

	 
* Merge with SECTION 4: EMPLOYMENT
merge 1:1 household_id2 individual_id2 using "ETH_2013_ESS_v03_M_STATA\sect4_hh_w2.dta" 

* Keep only necessary variables
keep household_id household_id2 rural hh_weight2013 individual_id individual_id2 ///
    enumeration_areaid enumeration_areaid2 region zone district kebele enumeration_code ///
    relation_head sex age_years2013 marital_status2013 father_highest_educ2013 ///
    mother_highest_educ2013 ever_attended_school2013 currently_attending_school2013 ///
    plan_school_nextyear2013 hh_s4q31 hh_s4q34 hh_s4q37
	
* Generate employment-related dummy variables
gen employed_cash_food = 1 if hh_s4q31 == 1
replace employed_cash_food = 0 if hh_s4q31 == 2

gen casual_labour_work = 1 if hh_s4q34 == 1
replace casual_labour_work = 0 if hh_s4q34 == 2

gen worked_free_others = 1 if hh_s4q37 == 1
replace worked_free_others = 0 if hh_s4q37 == 2


rename employed_cash_food employed_cash_food2013
rename casual_labour_work casual_labour_work2013
rename worked_free_others worked_free_others2013


* STEP 4: Merge with GPS/Geovariables data

merge m:1 household_id2 using "ETH_2013_ESS_v03_M_STATA\Pub_ETH_HouseholdGeovars_Y2.dta"

*** Keep essential variables including geolocation
keep household_id household_id2 rural hh_weight individual_id individual_id2 enumeration_areaid enumeration_areaid2 ///
     region zone district kebele enumeration_code relation_head sex age_years marital_status ///
     father_highest_educ mother_highest_educ ///
     ever_attended_school currently_attending_school plan_school_nextyear ///
     hh_s4q31 hh_s4q34 hh_s4q37 ///
     lat_dd_mod lon_dd_mod ///
     employed_cash_food2013 casual_labour_work2013 worked_free_others2013
	 
*** Rename geolocation variables
rename lat_dd_mod latitude_modified
rename lon_dd_mod longitude_modified
 
*** Drop observations from Region 1 (Tigray)
drop if region == 1

*** Save final dataset
save "ESS_2013_hh", replace


*******************************************************
* Merge ESS 2013 Household Data with ACLED Conflict Data
*******************************************************

*** Load cleaned ESS 2013 household data
use "ESS_2013_hh.DTA", clear

* Create a constant merge key
gen n=1

* Merge with ACLED cleaned conflict dataset
merge m:1 n using "ACLED\cleaned.dta"
drop n


* Calculate geodistance from each household to each conflict site

forval i = 1/420 {
    geodist latitude_modified longitude_modified LATITUDE`i' LONGITUDE`i', gen(distance`i')
}

* Generate year variable 
gen year_2013 = 2013

* Save cleaned dataset
save "ESS_2013_hh_cleaned.DTA", replace


************************
* Cleaning 2015 ESS Data
************************

use "2015_ESS\Household\sect1_hh_w3", clear

* Check for duplicate entries based on household and individual IDs
duplicates report household_id2 individual_id2

* merge SECTION 1: HOUSEHOLD ROSTER with SECTION 2 -EDUCATION
merge 1:1 household_id2 individual_id2 using "2015_ESS\Household\sect2_hh_w3"

* Drop unnecessary variables
drop hh_s1q04d hh_s1q04e hh_s1q04f hh_s1q04g_1 hh_s1q04g_2 hh_s1q04g_3 hh_s1q04h hh_s1q06 

* Rename variables for consistency
rename ea_id enumeration_areaid
rename ea_id2 enumeration_areaid2
rename saq01 region
rename saq02 zone
rename saq03 district
rename saq06 kebele
rename saq07 enumeration_code
rename hh_s1q02 relation_head
rename hh_s1q03 sex
rename hh_s1q04a age_years2015
rename hh_s1q08 marital_status2015
rename hh_s1q15 father_highest_educ2015
rename hh_s1q19 mother_highest_educ2015
rename hh_s2q03 ever_attended_school2015
rename hh_s2q06 currently_attending_school2015
rename pw_w3 hh_weight2015
rename hh_s2q18 plan_school_nextyear2015

* Keep relevant variables only
keep household_id household_id2 rural hh_weight individual_id individual_id2 ///
     enumeration_areaid enumeration_areaid2 region zone district kebele enumeration_code ///
     saq08 relation_head sex age_years marital_status hh_s1q07 hh_s1q11 hh_s1q12 ///
     hh_s1q14 father_highest_educ hh_s1q16 hh_s1q18 mother_highest_educ ///
     ever_attended_school currently_attending_school plan_school_nextyear

	 
* Merge with Section 4: Employment
merge 1:1 household_id2 individual_id2 using "2015_ESS\Household\sect4_hh_w3.dta" 

* Keep relevant variables after merge
keep household_id household_id2 rural hh_weight2015 individual_id individual_id2 ///
     enumeration_areaid enumeration_areaid2 region zone district kebele enumeration_code ///
     relation_head sex age_years2015 marital_status2015 hh_s1q07 hh_s1q11 hh_s1q12 ///
     hh_s1q14 father_highest_educ2015 hh_s1q16 hh_s1q18 mother_highest_educ2015 ///
     ever_attended_school2015 currently_attending_school2015 plan_school_nextyear2015 ///
     hh_s4q31 hh_s4q34 hh_s4q37

* Create employment-related dummy variables
gen employed_cash_food = 1 if hh_s4q31 == 1
replace employed_cash_food = 0 if hh_s4q31 == 2

gen casual_labour_work = 1 if hh_s4q34 == 1
replace casual_labour_work = 0 if hh_s4q34 == 2

gen worked_free_others = 1 if hh_s4q37 == 1
replace worked_free_others = 0 if hh_s4q37 == 2


rename employed_cash_food employed_cash_food2015
rename casual_labour_work casual_labour_work2015
rename worked_free_others worked_free_others2015

* Merge with Household Geovariables (GPS data)
merge m:1 household_id2 using "2015_ESS\Geovariables\ETH_HouseholdGeovars_y3.dta"

* Keep final relevant variables
keep household_id household_id2 rural hh_weight2015 individual_id individual_id2 ///
     enumeration_areaid enumeration_areaid2 region zone district kebele enumeration_code ///
     relation_head sex age_years2015 marital_status2015 hh_s1q07 hh_s1q11 hh_s1q12 ///
     hh_s1q14 father_highest_educ2015 hh_s1q16 hh_s1q18 mother_highest_educ2015 ///
     ever_attended_school2015 currently_attending_school2015 plan_school_nextyear2015 ///
     hh_s4q31 hh_s4q34 hh_s4q37 employed_cash_food2015 casual_labour_work2015 ///
     worked_free_others2015 lat_dd_mod lon_dd_mod
	 
* Rename GPS coordinates
rename lat_dd_mod latitude_modified
rename lon_dd_mod longitude_modified
 
* Drop observations in Region 1 (Tigray)
drop if region == 1

* Save cleaned dataset
save "ESS_2015_hh", replace


******************************************************
* Merge ESS 2015 with ACLED (Conflict Dataset)
*******************************************************

use "ESS_2015_hh.DTA", clear

* Add a constant to facilitate merging
gen n=1

* Merge with conflict dataset
merge m:1 n using "ACLED\cleaned.dta"
drop n

* Calculate distance from households to each conflict site
forval i = 1/420 {
    geodist latitude_modified longitude_modified LATITUDE`i' LONGITUDE`i', gen(distance`i')
}

* Generate survey year variable
gen year_2015 = 2015

* Drop merge indicator
drop _merge

* Save cleaned dataset
save "ESS_2015_hh_cleaned.DTA", replace



*******************************************************
* Merge 2013 and 2015 ESS Data
*******************************************************

use "ESS_2013_hh_cleaned.DTA", replace

* Check and drop duplicate entries
duplicates list 

drop _merge
duplicates drop household_id individual_id, force

* Merge 2013 with 2015 ESS cleaned dataset
merge 1:1 household_id2 individual_id2 using "ESS_2015_hh_cleaned.DTA"

* Drop variables not needed for reshaping
drop hh_s1q07 hh_s1q11 hh_s1q12 hh_s1q14 hh_s1q16 hh_s1q18 ///
     hh_s4q31 hh_s4q34 hh_s4q37
	 

* Reshape to long format by year
reshape long hh_weight age_years marital_status father_highest_educ ///
              mother_highest_educ ever_attended_school currently_attending_school ///
              plan_school_nextyear employed_cash_food casual_labour_work ///
              worked_free_others, ///
        i(household_id2 individual_id2) j(Year)

		  
* Save the reshaped dataset
save "ESS_2013_2015_hh_acled.DTA", replace



**************************************************************
* Create Post-Treatment Indicator
**************************************************************
 
use "ESS_2013_2015_hh_acled.DTA", clear


egen pid = group(household_id2 individual_id2)

bysort pid (Year): gen obs_count = _N

tabulate obs_count

* Create Post-Treatment Indicator

gen post = (Year == 2015)


*** Generate Binary Indicator: Attending School

gen attending_school = 1 if currently_attending_school == 1
replace attending_school = 0 if currently_attending_school == 2

* Create Binary Distance Indicators for Conflict Events
* Distance within 20km

foreach i of numlist 1/420 {
    gen distance_binary_20`i' = (distance`i' <= 20)
}

* Distance within 50km

foreach i of numlist 1/420 {
    gen distance_binary_50`i' = (distance`i' <= 50)
}

* Count Number of Conflicts Within Distance Ranges

gen count_distances_20 = 0
foreach i of numlist 1/420 {
    replace count_distances_20 = count_distances_20 + distance_binary_20`i'
}

* Count of conflicts within 50km

gen count_distances_50 = 0
foreach i of numlist 1/420 {
    replace count_distances_50 = count_distances_50 + distance_binary_50`i'
}


*** Total incidents within 20km and 50km


* Sum total incidents within 20km

gen Total_incidents_20 = 0   

forval i = 1/420 {  
    replace Total_incidents_20 = Total_incidents_20 + total_incidents`i' if distance_binary_20`i' == 1
}

* Sum total incidents within 50km

gen Total_incidents_50 = 0   

forval i = 1/420 {  
    replace Total_incidents_50 = Total_incidents_50 + total_incidents`i' if distance_binary_50`i' == 1
}



*** Total fatalities within 20km and 50km

* Initialize total fatalities variables
gen Total_fatalities_20 = 0   
gen Total_fatalities_50 = 0   

* Sum total fatalities within 20km

forval i = 1/420 {  
    replace Total_fatalities_20 = Total_fatalities_20 + total_fatalities`i' if distance_binary_20`i' == 1
}

* Sum total fatalities within 50km

forval i = 1/420 {  
    replace Total_fatalities_50 = Total_fatalities_50 + total_fatalities`i' if distance_binary_50`i' == 1
}


********** Drop intermediate distance and conflict variables **********

drop distance_binary*
drop total_fatalities*
drop total_incidents*
drop distance*


*** Create dummy variables for conflict exposure based on Incidents ***

* Exposure dummies for 20km and 50km radius (threshold > 5 incidents)

gen exposed_dummy_20 = (Total_incidents_20 > 5) 
gen exposed_dummy_50 = (Total_incidents_50 > 5) 


* Interaction terms between exposure dummies and post-treatment period 

gen interaction_dummy_20 = exposed_dummy_20*post
gen interaction_dummy_50 = exposed_dummy_50*post

* Continuous interaction terms (total incidents * post)

gen interaction_continuous_20 = Total_incidents_20*post
gen interaction_continuous_50 = Total_incidents_50*post


*** Create dummy variables for conflict exposure based on Fatalities ***

* Exposure dummies for 20km and 50km radius (threshold > 25 fatalities)

gen exposed_dumfatalit_20 = (Total_fatalities_20 > 25) 
gen exposed_dumfatalit_50 = (Total_fatalities_50 > 25) 

* Interaction terms between fatality exposure dummies and post-treatment period

gen interaction_dumfatalit_20 = exposed_dumfatalit_20*post
gen interaction_dumfatalit_50 = exposed_dumfatalit_50*post

* Continuous interaction terms (total fatalities * post)

gen interaction_numberfatal_20 = Total_fatalities_20*post
gen interaction_numberfatal_50 = Total_fatalities_50*post


***** Create male dummy from sex variable *****

gen sex_new = 1 if sex == 1
replace sex_new = 0 if sex == 2

gen male = (sex_new == 1)

***** Create interaction terms of exposure and male dummy *****

gen inter_dummy_20_sex = interaction_dummy_20*male
gen inter_dummy_50_sex = interaction_dummy_50*male


gen inter_cont_20_sex = interaction_continuous_20*male
gen inter_cont_50_sex =  interaction_continuous_50*male


***** Rename interaction variables to more interpretable names *****

rename interaction_dummy_20 Exposure_20km_dummy
rename interaction_dummy_50 Exposure_50km_dummy
rename interaction_continuous_20 Exposure_20km_continuous
rename interaction_continuous_50 Exposure_50km_continuous

***** Label variables for clarity *****

label variable Exposure_20km_dummy "Conflict within 20km"
label variable Exposure_50km_dummy "Conflict within 50km"
label variable Exposure_20km_continuous "Number of conflicts within 20km"
label variable Exposure_50km_continuous "Number of conflicts within 50km"

***** Rename and label fatalities interaction variables *****

rename interaction_numberfatal_20 Exposure_20km_fatalities
rename interaction_numberfatal_50 Exposure_50km_fatalities

label variable Exposure_20km_fatalities "Number of fatalities within 20km"
label variable Exposure_50km_fatalities "Number of fatalities within 50km"

rename interaction_dumfatalit_20 Exposure_20km_fatalitdummy
rename interaction_dumfatalit_50 Exposure_50km_fatalitdummy


***** Save the dataset *****
save "ESS_2013_2015_hh_acled_final.DTA", replace


* Make ready for regression: Prepare data and variables 


use "ESS_2013_2015_hh_acled_final.DTA", clear

* Create household size variable

sort household_id2

gen household_size = .

bysort household_id2 (household_id2): replace household_size = _N

* Keep only children aged 7 to 18

drop if age_years < 7 | age_years > 18

* Create binary variable for planning to attend school next year (1 = yes, 0 = no)

gen plan_school = 1 if plan_school_nextyear == 1
replace plan_school = 0 if plan_school_nextyear == 2


* Save dataset
save "ESS_2013_2015_hh_acled_final.DTA", replace





